Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design
نویسنده
چکیده
In this paper we implement an echo state network within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. Experimental results show that the proposed approach is simple and effective.
منابع مشابه
Totally Model-Free Reinforcement Learning by Actor-Critic Elman Networks in Non-Markovian Domains
In this paper we describe how an actor critic rein forcement learning agent in a non Markovian domain nds an optimal sequence of actions in a totally model free fashion that is the agent neither learns transitional probabilities and associated rewards nor by how much the state space should be augmented so that the Markov prop erty holds In particular we employ an Elman type re current neural ne...
متن کاملA Model-Based Actor-Critic Algorithm in Continuous Time and Space
This paper presents a model-based actorcritic algorithm in continuous time and space. Two function approximators are used: one learns the policy (the actor) and the other learns the state-value function (the critic). The critic learns with the TD(λ) algorithm and the actor by gradient ascent on the Hamiltonian. A similar algorithm had been proposed by Doya, but this one is more general. This al...
متن کاملApplication of reinforcement learning to balancing of acrobot
The acrobot is a two-link robot, actuated only at the joint between the two links. It is one of dicult tasks in reinforcement learning (RL) to control the acrobot because it has nonlinear dynamics and continuous state and action spaces. In this article, we discuss applying the RL to the task of balancing control of the acrobot. Our RL method has an architecture similar to the actor-critic. The ...
متن کاملActor-Critic Reinforcement Learning with Neural Networks in Continuous Games
Reinforcement learning agents with artificial neural networks have previously been shown to acquire human level dexterity in discrete video game environments where only the current state of the game and a reward are given at each time step. A harder problem than discrete environments is posed by continuous environments where the states, observations, and actions are continuous, which is what th...
متن کاملOn using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains
We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of CohenGrossberg node dynamics describes the node-activation computations of a neural network (NN). Our NN (i.e., agent) is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011